Search CORE

16 research outputs found

MetaLangCORP: PREDSTAVLJANJE PRVOGA KORPUSA MEDIJSKOGA METAJEZIKA NA SLOVENSKOM, HRVATSKOM I SRPSKOM I MOGUĆNOSTI NJEGOVE MEĐUDISCIPLINARNE PRIMJENE

Author: Ksenija Bogetić
Publication venue: 'Faculty of Humanities and Social Sciences University of Rijeka'
Publication date: 01/01/2021
Field of study

Growing interest in meta-language, in linguistics and other disciplines, has highlighted a gap in metalanguage corpora and analytical resources, which remain among the scarcest in corpus-linguistic developments so far. This paper is aimed at making a step towards filling this gap, both by presenting our own metalanguage corpus resource and using it in a short sample analysis to discuss the applications of such resources in linguistics and social sciences. Specifically, the paper presents for the first time MetaLangCORP, a multielement corpus of contemporary media metalanguage in languages of three post-Yugoslav states, linguistically annotated and made available open-access at the CLARIN repository of linguistic resources. To put the corpus in context, the meaning and relevance of metalanguage research is outlined, the existing efforts at compiling corpora of metalanguage are reviewed, and a sample preliminary analysis of MetaLangCORP keywords is presented to open a broader discussion on the potential applicability of metalanguage corpora. More broadly, it is hoped that making this kind of data available will prompt more nuanced analyses of metalanguage, as well as more corpus-building efforts along similar lines in Slavic and other linguistic scholarship.Sve veći interes za metajezik, kako u lingvistici, tako i u drugim disciplinama, naglasio je prazninu koja postoji u metajezičnim korpusima i analitičkim izvorima koji spadaju među neke od najrjeđih u sklopu suvremenih dosega korpusne linvistike. Ovaj je rad usmjeren ka popunjavanju te praznine na način da u njemu predstavljamo naš metajezični korpus te ga potom koristimo u kratkoj analizi koja služi kao primjer na temelju kojega raspravljamo o mogućnostima primjene takvih izvora u lingvistici i društvenim znanostima. U radu se prvi put predstavlja MetaLangCorp, višeelmentni korpus suvremenoga medijskog metajezika prisutnoga u jezicima triju država nastalih raspadom Jugoslavije, koji je lingvistički anotiran i dostupan u slobodnome pristupu u sklopu repozitorija lingvističkih resursa CLARIN. Kako bismo korpus smjestili u kontekst, dajemo kratki prikaz značenja i značaja metajezika, kratki osvrt na postojeće napore u sastavljanju metajezičnih korpusa te predstavljamo preliminarnu analizu ključnih riječi iz MetaLangCORP-a s ciljem otvaranja šire rasprave o mogućim primjenama metajezičnih korpusa. Nadamo se da će dostupnost ovih podataka potaknuti iznijansiranije analize metajezika kao i daljnje slične napore usmjerene na stvaranje korpusa kako za slavenske, tako i za jezike koji pripadaju drugim jezičnim porodicama

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Metaphors of English and Serbian language in British and Serbian newspaper discourse ; Метафоры об английском и сербском языках в британском и сербском газетном дискурсе

Author: Bogetić Ksenija
Publication venue: Универзитет у Београду, Филолошки факултет
Publication date: 20/09/2016
Field of study

Predmet istraživanja u ovom radu su metaforičke predstave o engleskom i srpskom jeziku u dnevnim novinama Velike Britanije i Srbije, a cilj istraživanja je dvojak. Prvi cilj bio je da se doprinese razumevanju medijskih predstava o jeziku, kroz analizu pojmovnih metafora putem kojih se engleski odn. srpski jezik konceptualizuje, ali i pozicionira i konstruiše u diskursu. Drugi cilj istraživanja, proistekao iz takvog ugla posmatranja, bio je teorijsko-metodološke prirode: razmatranje mogućnosti integrisanog diskursno-kognitivnog pristupa metafori, te dublje razumevanje povezanosti metaforičke konceptualizacije i diskursnih značenja. U radu se razvija novi pojam diskursnih metaforičkih okvira, kao analitički aparat za opis društvenih značenja metafora. Za potrebe analize sačinjen je korpus novinskih tekstova na temu jezika iz pet britanskih (59.336 reči) i pet srpskih dnevnih novina (67.218 reči)...This thesis explores the metaphorical representations of the English language and the Serbian language in British and Serbian daily newspapers, with a twofold aim. The first aim was to further the understanding of media representations of language, by analyzing the metaphors involved in the conceptualization, as well as the discursive construction, of the two languages. The second aim, stemming from such a perspective, was a theoretical and methodological one: examining the possibility of an integrated discourse-cognitive approach to metaphor, along with a better understanding of the relations of metaphorical conceptualization and discursive meaning. The paper develops a new concept of discursive metaphorical frames, as an analytical apparatus for capturing the social meanings of metaphor. For analytical purposes, a corpus of newspaper texts on the topic of language from five daily British newspapers (59,336 words) and five daily Serbian newspapers (67,218 words) was compiled..

National Repository of Dissertations in Serbia (NaRDuS)

Nardus

Kompiliranje korpusa u digitalnim humanističkim znanostima u jezicima s ograničenim resursima: o praksi kompiliranja tematskih korpusa iz digitalnih medija za srpski, hrvatski i slovenski

Author: Batanović Vuk
Bogetić Ksenija
Ljubešić Nikola
Publication venue: 'Hrvatsko filolosko drustvo (Croatian Philological Society)'
Publication date: 01/01/2022
Field of study

The digital era has unlocked unprecedented possibilities of compiling corpora of social discourse, which has brought corpus linguistic methods into closer interaction with other methods of discourse analysis and the humanities. Even when not using any specific techniques of corpus linguistics, drawing on some sort of corpus is increasingly resorted to for empirically–grounded social–scientific analysis (sometimes dubbed ‘corpus–assisted discourse analysis’ or ‘corpus–based critical discourse analysis’, cf. Hardt–Mautner 1995; Baker 2016). In the post–Yugoslav space, recent corpus developments have brought table–turning advantages in many areas of discourse research, along with an ongoing proliferation of corpora and tools. Still, for linguists and discourse analysts who embark on collecting specialized corpora for their own research purposes, many questions persist – partly due to the fast–changing background of these issues, but also due to the fact that there is still a gap in the corpus method, and in guidelines for corpus compilation, when applied beyond the anglophone contexts. In this paper we aim to discuss some possible solutions to these difficulties, by presenting one step–by–step account of a corpus building procedure specifically for Croatian, Serbian and Slovenian, through an example of compiling a thematic corpus from digital media sources (news articles and reader comments). Following an overview of corpus types, uses and advantages in social sciences and digital humanities, we present the corpus compilation possibilities in the South Slavic language contexts, including data scraping options, permissions and ethical issues, the factors that facilitate or complicate automated collection, and corpus annotation and processing possibilities. The study shows expanding possibilities for work with the given languages, but also some persistently grey areas where researchers need to make decisions based on research expectations. Overall, the paper aims to recapitulate our own corpus compilation experience in the wider context of South–Slavic corpus linguistics and corpus linguistic approaches in the humanities more generallyDigitalno doba otvorilo je nove mogućnosti za sastavljanje korpusa društvenog diskursa, što je korpusnolingvističke metode približilo drugim metodama analize diskursa te humanističkim znanostima. Čak i kada se ne koriste nikakve specifične tehnike korpusne lingvistike, danas je za empirijski utemeljenu društveno–znanstvenu analizu sve učestalije korištenje neke vrste korpusa (‘korpusno–asistirana analiza diskursa’ ili ‘kritička korpusna analiza’, Hardt–Mautner 1995; Baker 2016). U postjugoslavenskom prostoru, nedavni razvoj korpusne lingvistike donio je prednosti u mnogim područjima istraživanja. Ipak, za lingviste i analitičare diskursa koji se upuštaju u prikupljanje specijaliziranih korpusa za vlastite istraživačke svrhe, i dalje ostaju otvorena mnoga pitanja – djelomično zbog pozadine korpusne lingvistike koja se brzo mijenja, ali i zbog činjenice da još uvijek postoji rascjep u poznavanju korpusnih metoda, kao i metodologije sastavljanja korpusa izvan anglofonskog konteksta. Ovim radom pokušavamo smanjiti spomenuti rascjep predstavljajući jedan postupni prikaz postupka izgradnje korpusa za hrvatski, srpski i slovenski, kroz primjer sastavljanja tematskog korpusa iz digitalnih medija (novinski članci i komentari čitatelja). Nakon pregleda tipova korpusa, korištenja i prednosti u društvenim znanostima i digitalnim humanističkim znanostima, predstavljamo mogućnosti sastavljanja korpusa u južnoslavenskim jezičnim kontekstima, uključujući opcije preuzimanja podataka s mreže, dozvola i etičkih pitanja, čimbenika koji olakšavaju ili otežavaju automatizirano prikupljanje i označavanje korpusa i mogućnosti obrade. Studija otkriva sve veće mogućnosti za rad s danim jezicima, ali i neka uporno siva područja u kojima istraživači trebaju donositi odluke na temelju istraživačkih očekivanja. Općenito, rad ima za cilj rekapitulirati vlastito iskustvo sastavljanja korpusa u širem kontekstu južnoslavenske korpusne lingvistike i korpusnih lingvističkih pristupa u humanističkim znanostima općenito

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Annotated corpus of Serbian language-related news comments MetaLangNEWS-COMMENTS-Sr

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of user comments on online news articles on the topic of language from major Serbian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, from the bottom-up perspective. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of non-standard Serbian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. This collection is complementary to the corpus of news articles MetaLangNEWS-Sr (http://hdl.handle.net/11356/1371). Parallel versions from Croatia (http://hdl.handle.net/11356/1370) and Slovenia (http://hdl.handle.net/11356/1362) are also available

Common Language Resources and Technology Infrastructure - Slovenia

Annotated corpus of Serbian language-related news articles MetaLangNEWS-Sr

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of news articles on the topic of language, published in major Serbian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, ongoing in post-Yugoslav societies. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of standard Serbian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. MetaLangNEWS-Sr is complemented with a separate corpus of citizen metalanguage comments, i.e. online comments to the news articles, available as MetaLangNEWS-COMMENTS-Sr (http://hdl.handle.net/11356/1372). Parallel versions from Slovenia (http://hdl.handle.net/11356/1360) and Croatia (http://hdl.handle.net/11356/1369) are also available

Common Language Resources and Technology Infrastructure - Slovenia

Annotated corpus of Croatian language-related news articles MetaLangNEWS-Hr

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of news articles on the topic of language, published in major Croatian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, ongoing in post-Yugoslav societies. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of standard Croatian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. MetaLangNEWS-Hr is complemented with a separate corpus of citizen metalanguage comments, i.e. online comments to the news articles, available as MetaLangNEWS-COMMENTS-Hr (http://hdl.handle.net/11356/1370). Parallel versions from Slovenia (http://hdl.handle.net/11356/1360) and Serbia (http://hdl.handle.net/11356/1371) are also available

Common Language Resources and Technology Infrastructure - Slovenia

Annotated corpus of Croatian language-related news comments MetaLangNEWS-COMMENTS-Hr

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of user comments on online news articles on the topic of language from major Croatian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, from the bottom-up perspective. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of non-standard Croatian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. This collection is complementary to the corpus of news articles MetaLangNEWS-Hr (http://hdl.handle.net/11356/1369). Parallel versions from Slovenia (http://hdl.handle.net/11356/1362) and Serbia (http://hdl.handle.net/11356/1372) are also available

Common Language Resources and Technology Infrastructure - Slovenia

Annotated corpus of Slovenian language-related news articles MetaLangNEWS-Sl

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of news articles on the topic of language, published in major Slovenian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, ongoing in post-Yugoslav societies. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of standard Slovenian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. MetaLangNEWS-Sl is complemented with a separate corpus of citizen metalanguage comments, i.e. online comments to the news articles, available as MetaLangNEWS-COMMENTS-Sl (http://hdl.handle.net/11356/1362). Parallel versions from Croatia (http://hdl.handle.net/11356/1369) and Serbia (http://hdl.handle.net/11356/1371) are also available

Common Language Resources and Technology Infrastructure - Slovenia

Annotated corpus of Slovenian language-related news comments MetaLangNEWS-COMMENTS-Sl

Author: Batanović Vuk
Bogetić Ksenija
Publication venue: Regional Linguistic Data Initiative Centre ReLDI
Publication date: 30/10/2020
Field of study

A comprehensive corpus of user comments on online news articles on the topic of language from major Slovenian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, from the bottom-up perspective. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of non-standard Slovenian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. This collection is complementary to the corpus of news articles MetaLangNEWS-Sl (http://hdl.handle.net/11356/1360). Parallel versions from Croatia (http://hdl.handle.net/11356/1370) and Serbia (http://hdl.handle.net/11356/1372) are also available

Common Language Resources and Technology Infrastructure - Slovenia

Cognitive Processing of Syntactically Ambiguous Constructions: Insights from the Processing of Constructions with a Globally Ambiguous Relative Clause

Author: Bogetić Ksenija
Vladisavljević Marko
Publication venue: 'Faculty of Humanities and Social Sciences University of Rijeka'
Publication date: 01/01/2021
Field of study

U psiholingvistici i kognitivnoj lingvistici mehanizmi obrade dvosmislenih jezičnih konstrukcija privlače pažnju jer njihovo razumijevanje može mnogo toga reći o fundamentalnim procesima razumijevanja jezika – povezivanju pojmova u jednu koherentnu reprezentaciju. U širokome spektru istraživanja o temi kognitivne obrade ovaj se rad fokusira na jednu vrstu dvosmislenosti za koju su različita istraživanja u različitim jezicima utvrdila kontradiktorne principe obrade: na sintaktičku dvosmislenost odnosne rečenice, do danas uvjerljivo najčešću i najkontroverzniju rečeničnu konstrukciju za istraživanje jezične obrade. Tekst nudi pregled empirijskih nalaza, hipoteza i implikacija za šire modele kognitivne obrade, uključujući njihove kontradiktornosti i metodološka pitanja koja zavređuju daljnje ispitivanje. Zaključna razmatranja sumiraju nalaze, nedostatke i nužne pravce za buduća istraživanja.In psycholinguistics and cognitive linguistics, mechanisms of processing ambiguous linguistic constructions have attracted great attention, since their understanding can tell us a lot about the fundamental processes of language comprehension — attaching concepts into single coherent representations. From the wide area of research on the topic of cognitive processing, this paper focuses on one type of ambiguity for which contradictory principles of processing have been established in different languages: the syntactic ambiguity of the relative clause, the most common and the most controversial sentence construction in research on language processing to date. The text offers an overview of empirical findings, hypotheses, and implications for wider models of cognitive processing, including their contradictions and methodological questions deserving further consideration. The concluding remarks sum up the findings, drawbacks and needed directions of future research

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia